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Abstract 

Markov chain Monte Carlo is a method of producing a correlated sample 
in order to estimate features of a target distribution via ergodic averages. A 
fundamental question is when should sampling stop? That is, when are the 
ergodic averages good estimates of the desired quantities? We consider a method 
that stops the simulation when the width of a confidence interval based on an 
ergodic average is less than a user-specified value. Hence calculating a Monte 
Carlo standard error is a critical step in assessing the simulation output. We 
consider the regenerative simulation and batch means methods of estimating the 
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variance of the asymptotic normal distribution. We give sufficient conditions 
for the strong consistency of both methods and investigate their finite sample 
properties in a variety of examples. 



1 Introduction 

Suppose our goal is to calculate E w g := § x g(x)n(dx) with n a probability distribution 
having support X and g a real-valued, 7r-integrable function. Also, suppose n is such 
that Markov chain Monte Carlo (MCMC) is the only viable method for estimating E v g. 

Let X = {X , Xi, X 2 , . . . } be a time-homogeneous, aperiodic, 7r-irreducible, positive 
Harr is recurrent Markov chain with state space (X, B(X)) and invariant distribution n. 



(See 



Mevn and Tweedid (|1993[ ) for definitions.) In this case, we say that X is Harris 
ergodic and the Ergodic Theorem implies that, with probability 1, 

j n— 1 

- y g(Xi) -»• E n g as w ^ oo. (1) 



n-1 

1 

9n 



n 

i=0 



Given an MCMC algorithm that simulates X it is conceptually easy to generate large 
amounts of data and use g n to obtain an arbitrarily precise estimate of E n g. 

There are several methods for deciding when n is sufficiently large; i.e., when to 
terminate the simulation. The simplest is to terminate the computation whenever 
patience runs out. This approach is unsatisfactory since the user would not have any 
idea about the accuracy of g n . Alternatively, with several preliminary (and necessarily 
short) runs the user might be able to make an informed guess about the variability in 
g n and hence make an a priori choice of n. Another method would be to monitor the 
sequence of g n until it appears to have stabilized. None of these methods are automated 
and hence are inefficient uses of user time and Monte Carlo resources. Moreover, they 
provide only a point estimate of E n g without additional work. 
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Convergence diagnostics are also sometimes used to terminate the simulation ([Cowles and Carlin 



1996). Some convergence diagnostics are available in software, e.g. the R package boa, 



and hence may be considered automated. However, none of the diagnostics of which 
we are aware explicitly address how well g n estimates E n g; this is discussed again in 
subsection 14.1.11 

An alternative is to calculate a Monte Carlo standard error and use it to terminate 
the simulation when the width of a confidence interval falls below a specified value. 
Under regularity conditions (see Section EJ) the Markov chain X and function g will 
admit a central limit theorem (CLT); that is, 



(2) 



as n 



oo where a 2 := vax n {g(X )} + 2 cov Tr{g(X ), g(Xi)}. Given an estimate of 



a 2 , say a 2 , we can form a confidence interval for E n g. If this interval is too large then 



the value of n is increased and simulation continues until the interval 
small; this is a commo n way of choosing n (e.g., see 



Jones and Hobert 



Fishman 



1996 



is suffi c iently 



Gever 



1992; 



2001). Notice that the final Monte Carlo sample size is random. We 
study sequential fixed-width methods which formalize this approach. In particular, the 
simulation terminates the first time 



a 



t* — 1= +p(n) < e 



n 



(3) 



where t* is an appropriate quantile, p(n) > on Z + and e > is the desired half-width. 
The role of p is to ensure that the simulation is not terminated prematurely due to a 
poor estimate of a 2 . One possibility is to fix n* > and take p(n) = el{n < n*) where 
/ is the usual indicator function. 



Sequential statistical procedures have a long history; see 



Lail (|200lh for an overview 



and commentary. Moreover, c 
intervals such as those found in 



assical approaches t o seq u entia 



Chow and Robbins 



(196E 



fixed-width confidence 



Liul (|l997h and 



Nadaj (|1969j) 



are known to work well. However, the classical procedures are not relevant to the current 
work since they assume the observations are random samples. 

In a simulation context, procedures based on Q were studied most notably by 



Glvnn and Whittl (|1992f ) who established that these procedures are asymptotically valid 
in that if our goal is to have a 100(1 — 8)% confidence interval with width 2e then 



Pr(E^ G Int[T(e)]) -^1-5 as e 







(4) 



where T(e) is the first time that © is satisfied and Int[T(e)] is the interval at this 
time. Glynn and Whitt's conditions for asymptotic validity are substantial: (i) A 
functional central limit theorem (FCLT) holds; (ii) a 2 — > a 2 with probability 1 as 
n — > oo; and (iii) p(n) = o(n~ l l 2 ). Markov chains frequently enjoy an FCLT under 
the same conditions that ensure a CLT. However, in the context of MCMC, little 
work has been done on establishing conditions for (ii) to hold. Thus one of our goals 
is to give conditions under which some common methods provide strongly consistent 
estimators of a 2 . Specifically, our conditions require the sampler to be either uniformly 
or geometrically ergodic. The MCMC community has expended co nsiderable effort in 



estab 



ishin g such mixing conditions for a vari ety of samplers; see 



(20011) and 



Roberts and Rosenthal! (1998, 



Jones and Hobert 



2004) for some references and discussion. 



We consider two methods for estimating the variance of the asymptotic normal dis- 
tribution, regenerative simulation (RS) and non-overlapping batch means (BM). Both 
have strengths and weaknesses; essentially, BM is easier to implement but RS is on 
a stronger theoretical footing. For example, when used with fixed number of batches 
BM cannot be even weakly consistent for a 2 . We give conditions for the consistency 
of RS and show that BM can provide a consistent estimation procedure by allowing 
the batch sizes to increase (in a specific way) as n increases. In this case it is denoted 
CBM to distinguish it fr om the standard fixed-batch size version which we denote BM. 

i — in 

This was addressed by iDamerdiJ (|1994T ) but, while the approach is similar, our regu- 
larity conditions on X are weaker. Also, the regularity conditions required to obtain 
strong consistency of the batch means estimator are slightly stronger than those re- 
quired by RS. Finally, it is important to note that RS and CBM do not require that X 
be stationary; hence burn-in is not required. 
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The justification of fixed-width methods is entirely asymptotic so it is not clear 
how the finite sample properties of BM, CBM, and RS compare in typical MCMC 
settings. For this reason, we conduct a simulation study in the context of two benchmark 
examples and two realistic examples, one of which is a complicated frequentist problem 
and one which involves a high- dimensional posterior. Roughly speaking, we find that 
BM performs poorly while RS and CBM are comparable. 

The rest of this article is organized as follows. Section El fixes notation and contains 
a brief discussion of some relevant Markov chain theory. In Section El we consider RS 
and CBM. Then Section 0] contains the examples. 



2 Basic Markov Chain Theory 

For n G N := {1, 2, 3, . . .} let P n (x, dy) be the n-step Markov transition kernel; that is, 
for x G X and A G S(X), P n {x, A) = Pr (X n G A\X = x). A Harris ergodic Markov 
chain X enjoys a strong form of convergence. Specifically, if A(-) is a probability measure 
on B(X) then 

||P n (A,-) -tt(0|| 10 asn^oo, (5) 

where P n (X, A) := j x P n (x, A)X(dx) and || • || is the total variation norm. Suppose there 
exists an extended real-valued function M(x) and a nonnegative decreasing function 
k(ji) on Z + such that 

\\P n (x,-) -tt(-)|| < M(x)K(n) . (6) 

When n(n) = t n for some t < 1 say X is geometrically ergodic if M is unbounded and 
uniformly ergodic if M is bounded. Polynomial ergodicity of order m where m > 
means M may be unbounded and k(u) = n~ m . 

Also, P satisfies detailed balance with respect to n if 

ir(dx)P(x,dy) = ir(dy)P(y,dx) for all x,y G X . (7) 



5 



Note that Metropolis-Hastings samplers satisfy (J7J) by construction but many Gibbs 
samplers do not. We are now in position to give conditions for the existence of a CLT. 

Theorem. Let X be a Harris ergodic Markov chain on X with invariant distribution 7r 
and suppose g : X — > R is a Borel function. Assume one of the following conditions: 

1. X is polynomially ergodic of order m > 1, E^M < oo and there exists B < oo 
such that |<?(£)| < -B almost surely; 

2. X is polynomially ergodic of order m, E^M < oo and E 7r |(yf(x)| 2+<5 < oo for some 
5 > where m5 > 2 + 5; 

3. X is geometrically ergodic and E n [g 2 (x)(\og + |g(x)|)] < oo; 

4. X is geometrically ergodic, satisfies (J7J) and E 7r (7 2 (x) < oo; or 

5. X is uniformly ergodic and E 7T g 2 (x) < oo. 
Then, for any initial distribution, as n — > oo 

v^(^-E^)^N(0,a 9 2 ). 

I 



1 I 1 



Remark 1. The theorem was proved by Ibraeimov and Lin nikl (| 



Roberts and Rosenthal! (|l997l ) (condition 4) 



Doukhan et al 



971[ ) (condition 5), 



( 19941 ) (condition 3). See 



Jones (2004) for details on conditions 1 and 2. 



Remark 2. Conditions 3, 4 and 5 of the theorem are also sufficient to guarante e the 



existence of an 



Billingslevl fjl968T ). respectively. 



'CLT; see 



Doukhan et all (Jl994j), 



Roberts and Rosenthal! ()1997f ) and 



Remark 3. The mixing conditions on the Markoy chai n X stated in Theorem El ar e not 



necessary 



Nummelin 



' or the CLT; see, for example, 



Chenl Jl999) 



Mevn and Tweediel (jl993T ) and 



(2002). However, the weaker conditions are often prohibitively difficult to 
check in situations where MCMC is appropriate. 

Remark 4. There are co nstructive techniques for v erifying the existence of an appropri- 



ate M and k from © (Me vn and Tweedie 



19931 Ch. 15). For example, one method 



of establishing geometric ergodicity requires finding a function V : X — > [l,oo) and a 
small set C G B(X) such that 

PV(x) < XV(x) + bl(x eC) V x G X (8) 

where PV(x) := J V(y)P(x,dy), < A < 1 and b < oo. Substantial effort has been 
devoted to establishi ng convergence rates for MCMC algorithms via (181 or related tech- 



mques 



(2004), 



1994J) 



For example, 



Hobert and Geverl (11 998). 



Marchev and Hobert 



(2004) 



Roberts and Rosenthal! (119 99). 



Hober 



Mira and Tier icv 



Rosenthal (1995, 



et al.l ( 
20oX 



(2002), 



Jones 



Robert 



1996) and 



ai 



1995 



Tie rnev 



d Hobert 



Roberts and Poison 



1994) ex- 



amined Gibb s sample r s whi 
20ol. 



(2000 



Gever 



(1999), 



Christensen et al 



2001 



Jarncr and Hansen (2000) 



Mevn and Tweedid (J1994J), and 
Hastings algorithms. 



Douc and Soulier 



Mengersen and Tweedie 



200^ 



Fort 



and Moulines 



Jar ner and Roberts! ((2.002), 



1996) analyzed Metropolis- 



2.1 The Split Chain 



An object that is important to the study of both RS and CBM is the split chain X' := 
{(X , So), (Xi, 5i), (X 2 , 5 2 ), . . . } which has state space X x {0, 1}. The construction of 
X' requires a minorization condition; i.e., a function s : X i— > [0, 1] for which E n s > 
and a probability measure Q such that 



P(x, A) > s{x) Q(A) for all x e X and A G B(X) . 



(9) 



When X is countable it is easy to see that ( P> ho lds b y fixing x* G X, se tting s(x) = I(x 



Rosenthal! (|l995| ) give prescriptions 



x*) and Q(-) = P(x*, •)• iMvkland et all ((1995J) and 
that are often useful for establishing Q in general spaces. Note that Q allows us to 
write P(x, dy) clS db mixture of two distributions, 

P(x, dy) = s(x) Q(dy) + [1 — s(x)] R(x, dy), 

where R(x,dy) := [1 — s(x)] -1 [P(x, dy) — s(x) Q{dy)] is the residual distribution (de- 
fine R(x, dy) as if s(x) = 1). This mixture gives us a recipe for simulating X': given 
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Xi = x, generate Si ~ Bernoulli (s(x)). If Si = 1, then draw X i+ i ~ Q(-), else draw 
Xf+i ~ R(x, •). 

The two chains, X and X' are closely related since X' will inherit properties such 
as aperiodicity and positive Harris recurrence and the sequence {Xi : i = 0, 1, ... } 
obtained from X' has the same transition probabilities as X. Also, X and X' converge 
to their respective stationary distributions at exactly the same rate. 

If Si = 1, then time i + 1 is a regeneration time when X' probabilistically restarts 
itself. Specifically, suppose we start X' with Xq ~ Q. Then each time that Si = 1, 
Xj+i ~ Let = ro < 7"i < • • • be the regeneration times. That is, set r r+ i = 
min{i > T r : = 1}. Also assume that X' is run for R tours; that is, the simulation 
is stopped the i?th time that a Si — 1. Let denote the total length of the simulation 
and N r be the length of the rth tour; that is, N r = r r — r r _i. Define 

T r — 1 



for r = 1, . . . , R. The (X r , 5V) pairs are iid since each is based on a different tour. In 
the sequel we will make r epeated use of the following lemma which generalizes Theorem 



2 of 



Hobert et al. 



Lemma 1. Let X be a Harris ergodic Markov chain with invariant distribution ir. As- 
sume that (JHI) holds and that X is geometrically ergodic. Let p > 1 be an integer. 



1. If E n \g\ 2(p 1)+s < oo for some S > then E Q Nf < oo and E Q S\ < oo. 

2. If E n \g\ 2P+s < oo for some 5 > then £ Q Xf < oo and £ Q Sf +<5 < oo. 

Proof. See Appendix |XJ □ 
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3 Output Analysis 



3.1 Regenerative Simulation 



Regenerative simulation is based on directly simulating the split chain. However, using 
the mixture ap proach described abov e is problematic since simulation from R(x, dy) 



is challenging. 



Mvkland et al 



((19951 ) suggest a method for avoiding this issue. Sup- 
pose (jHJ) holds and that the measures P(x, ■) and Q(-) admit densities k(-\x) and q(-), 
respectively. Then the following reci pe allows us to s imula te X' . Assume X ~ g(-); 



this is typically quite easy to do, see 



Mvkland et al 



( 19951 ) for some examples. Also, 



note that this means burn-in is irrelevant. Draw Aj + i ~ k(-\x), that is, draw from the 
sampler at hand, and get Si by simulating from the distribution of 5i\Xi, X i+ i with 

s(Xi)q(X i+1 ) 



Pi(6i = l\Xi,X i+1 ) 



(10) 



k{X i+ i | Xi) 

Example 1. In a slight abuse of notation let 7r also denote the density of the target 
distribution. Consider an independence Metropolis-Hastings sampler with proposal 
density v. This chain works as follows: Let the current state be X\ = x. Draw y ~ v 
and independently draw u ~ Uniform(0, 1). If 

u Tt{y)v{x) 

n(x)u(y) 

then set X i+1 = y otherwise set X i+ i = x 



Mvkl and et al 



( 19951) derive (HUJ) for this 



case. Let c > be a user-specified constant. Then conditional on an acceptance, i.e. 
Xi = x and X i+ i = y 



Pr(<5; = l\Xi = x, X,, 



i+l 



y) 



c max / flM ilM j jf m j n / zlM llM. 1 > c 

\ n(x) ' ir(y) J |_ u(x) ' v{y) J 

c max { u( x ) > / 11 max I ' *%) / < c 
1 otherwise . 

Note that we do not need to know the normalizing constants of tt or v to calculate (jllj) . 
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In discrete state spaces regenerations can be easy to identify. In particular, a re- 
generation occurs whenever the chain returns to any fixed state; for example, when 
the Metropolis-Hastings chain accepts a move to the fixed state. This regeneration 
scheme is most useful when the state space is not too large but potentially compli- 
cated; see subsection 14 .HI It will not be useful when the state space is extremely large 
because returns to the fixed state are too i nfrequent. Further practical advice on im 



pleme nt ing and automating RS is gi v en in 



1998) 



Gever and 



Thompson! ()1995h 



Brockwell 



Hobert et al 



and 



(21102) 



Hobert et al 



Gilks 



et al 



2005) and 



■Tones and Hobertl (|200lh. 

Implementation of RS is simple once we can effectively simulate the split chain. For 
example, the Ergodic Theorem implies that 

TR-l 



9r 



1 * 

i=o 



Kg 



with probability 1 as R — > oo and hence estimating E^g is routine. 

We now turn our attention to calculating a Monte Carlo standard error for g TR . Let 
Eq denote the expectation for the split chain started with Xq ~ Q{-)- Also, let N be 
the average tour length; that is, N = R' 1 Y^ r =i ^r- Since the (N r , S r ) pairs are iid the 
strong law implies with probability 1, N — > EqN\ which is finite by positive recurrence. 
If EqN"1 < oo and EqS\ < oo it follows that a CLT holds; i.e., as R — ► oo 

(12) 



% 8 -E I? )AN(0,(; 



where, as shown in 



estimator of is 



Hobert et al 



(2002J), £ = E Q {S 1 - N^gf/iEgN,) 2 . An obvious 



11 

&S ■= -]^xJ2( S r ~ 9r R N r 



r=l 
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Now consider 
f 2 - f 2 

^RS S 



1 1 



R 



^ 2 E Q (S 1 - N x E v gf ± E Q {S 1 - NtE^g) 



N 2 R 
1 1 



r=l 
R 



{E Q N X 



N 2 



J2 [(Sr - g TR N r f - E Q (S 1 - N x E,gf ± (S r - N r E„gf] 



r=l 



+ 



E Q {S 1 -N 1 E 1r gr[ m 



N 2 E Q Nl 

Using this representation and repeated applicatio n of the strong law s hows that £ RS 



£ 2 — > with probability 1 as R — > oo (also see iHobert et al. 



20021). It is typically 



difficult to check that EqN% < oo and EqS\ < oo. However, using Lemma [T] yields the 
following result. 

Proposition. Let X be a Harris ergodic Markov chain with invariant distribution n. 
Assume that E n \g\ 2+S < oo for some 6 > 0, holds and that X is geometrically 



ergodic. Then (fEZj) holds and £ 



£;? w. p. 1 as R — > oo. 



Fix e > and let 2 denote an appropriate standard normal quantile. An asymptot- 
ically valid fixed-width procedure results by terminating the simulation the first time 



z ^£ + p(R) < e 



R 



(13) 



3.2 Batch Means 

In standard batch means the output of the sampler is broken into batches of equal size 
that are assumed to be approximately independent. (This is not strictly necessary; c.f., 
the method of overlapping batch means.) Suppose the algorithm is run for a total of 
n = ab iterations (hence a = a n and b = b n are implicit functions of n) and define 



, jb-l 



for j 



i=(j-l)6 



The batch means estimate of a 2 is 



-2 

°BM 



(14) 



11 



With a fixed number of batches ()14|) is not a consistent estimator of a 1 (IGlvnn and Iglehart . 



1990; 



Glvnn and Whitt 



19911 1 . On the other hand, if the batch size and the number 



of batches are allowed to increase as the overall length of the simulation d oes it may 



be po ssible to obtain consistency. The first result in this direct ion is due to 



Damerdji 



Damerdii 



fll9_9J) is the 



(1994) which we now describe. The major assumption made by 
existence of a strong invariance principle. Let B = {B(t),t > 0} denote a standard 
Brownian motion. A strong invariance principle holds if there exists a nonnegative in- 
creasing function 7(n) on the positive integers, a constant < o g < oo and a sufficiently 
rich probability space such that 



y^ffPQ) - nE^g - a g B[ 



n 



i=i 



0("y(n)) w.p. 1 as n — > oo 



(15) 



Damerdii 



where the w.p. 1 in (JT5j) means for almost all sample paths. In particular 
(1994) assumed (fi3j) held with 7(71) = n l l 2 ~ a where < a < 1/2. However, it would 
seem a daunting task to directly check this condition in any given application. In an 
attempt to somewhat alleviate this difficulty we have the following lemma. 

Lemma 2. Let g : X — > R be a Borel function and let X be a Harris ergodic Markov 
chain with invariant distribution tt. 

1. If X is uniformly ergodic and E 7r |(yf| 2+<5 < 00 for some 5 > then ()15j) holds with 

7 (n) = n x l 2 - a where a < 5/(24 + 125). 

2. If X is geometrically ergodic, holds and E^gl 4 " 4 " 5 < 00 for some 5 > then 
(|T5j) holds with 7(71) = n a logn where a = 1/(2 + 5). 

Proof. The first p a rt of the lemma is an immediate consequence of Theorem 4.1 of 



Philipp and Stoutl (|l975l ) and the fact that uniformly ergodic Markov chains enjoy 



exponentially f ast uniform mixing . The second part follows from our Lemma and 



Theorem 2.1 in 



Csaki and Csorgol (|l995l ). 



□ 



Using part 1 of Lemma El we can state Damerdji's result as follows. 
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Proposition. ( Darner diil . [1994) Assume g : X — > R such that E^lgl 2 ^ < oo for some 
5 > and let X be a Harris ergodic Markov chain with invariant distribution 7r. Further, 
suppose X is uniformly ergodic. If 

1. a n — > oo as n — > oo, 

2. b n — * oo and 6„/n — > as n — > oo, 

3. b^n 1 ' 201 log n — > as n — > oo where a G (0,5/(24 + 125)) and 

4. there exists a constant c > 1 such that J2 n (bn/n) c < oo 

then as n — > oo, a 2 BM — > cr| w. p. 1. 

In Appendix[Blwe use part 2 of Lemma[2]to extend Proposition 13. 21 to geometrically 
ergodic Markov chains. 

Proposition. Assume g : X — > R such that E 7r |(yf| 4+<5 < oo for some 5 > and let X 
be a Harris ergodic Markov chain with invariant distribution n. Further, suppose X is 
geometrically ergodic. If 

1. a n — >• oo as n — > oo, 

2. 6 n — >• oo and 6 n /n — > as n — > oo, 

3. 6~ 1 n 2o [logra] 3 — > as n — > oo where a = 1/(2 + 5) and 

4. there exists a constant c > 1 such that J2 n (^n/ n ) c < °o 
then as n — > oo, d"^ M — >• o 2 w. p. 1. 

Remark 5. There is no assumption of stationarity in Propositions 13.21 or 13.21 Hence 
burn-in is not required to implement CBM. 

Remark 6. Consider using b n = \n e \ and a n = \n/b n \. Proposition 13.21 requires that 
l>0>l-2a>l - 5/(12 + 65) > 5/6 but Proposition l3~2l requires only 1 > 9 > 
(1 + 5/2)- 1 > 0. 
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Under the conditions of Propositions 13.21 or 13.21 an asymptotically valid fixed- width 
procedure for estimating E n g results if we terminate the simulation the first time 

ta n ~i—j= + p{n) < e 

where £ a „-i is the appropriate quantile from a student's t distribution with a n — 1 
degrees of freedom. 



3.3 Practical Implementation Issues 



Making practical use of the preceding theory requires (i) a moment condition; (ii) 
establishing geometric ergodicity of the sampler at hand; (iii) choosing p(n)\ (iv) using 
RS requires Q or at least (JTUJ); and (v) CBM requires choosing a n and b n . 

Since a moment condition is required even in the iid case we do not view (i) as 
restrictive. Consider (ii). It is easy to construct e xamples where the convergence rate 



1999T ) so the importance of 



is so slow that a Markov chain CLT does not hold f Ro berts! 
establishing the rate of convergence in (jBJ) should not be underestimated. On the other 
hand, the MCMC community has expended considerable effort in trying to understand 
when certain Markov chains are geometrically ergodic; see the references in Remark 01 
In our view, this is not the obstacle that it once was. 

Regarding (iii), we know of no work on choosing an optimal p(n). Recall that the 
theory requires p(n) = o(n" 1 ^ 2 ). In our examples we use p(n) = el(n < n*) where 
n* > is fixed. Since n* is typically chosen based on empirical experience with the 
sampler at hand we might want a penalty for sample sizes greater than n* so another 
reasonable choice might be p(n) = el{n < n*) + CrT k for some k > 1/2 and C > 0. 

The issue in (iv), i.e., calculating Q or (fTU|) is commonly viewed as overly bur- 
densome. Ih2weverj_Jnoui_e2£perience, this calculation need not be troublesome. For 



example, 



Mvkland et al. 



( 19951 ) give recipes for constructing and (fTUj) for Metropolis- 



Hastings independence and random walk samplers; recall (jllj) . There is also some work 
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on es tablishing these conditions for very 



nally, 



Brockwell and Kadand ([20051 ) and 



general models; see 



Hobert et al. 



Gever and Thompson 



(2005j). Fi 



1995) have shown that 



regenerations can be made to occur naturally via simulated tempering. 

Consider (v). As we noted in Remark El it is common to cho ose the batch sizes 



according to b n = \n d \ for some 6. 



Song and Schmeiserl (jl995h and 



Chienl (J19.88) have 



addressed the issue of what va l ue of 6 should be used from different theoretical points 
of view. In particular, IChienl (jl988f ) showed that (under regularity conditions) using 
9 = 1/2 results in the batch means approaching asymptotic normality at the fastest rate. 



Song and Schmeiserl (jl995T ) showed that (under different regularity conditions) using 
9=1/3 minimizes the asymptotic mean-squared error of o 2 BM . Note that Remark H 
shows that 6 = 1/3 requires a stronger moment condition than 6 = 1/2. We further 
address this issue in Section HJ 



3.4 Alternatives to BM and RS 



We chose to focus on BM and RS since in MCMC settings they seem to be the most com- 
mon methods for estimating the variance of the asymptotic normal distribut ion. How- 



ever, there are other methods w 



(1991) 



Damerdji 



Geverl (19921 ) 



ri ch m ay en 



Nummelin 



oy strong consistency; e.g . see 



Peligrad and Shad (jl995T ). In particular 



Damerdii 



(2MM) and 

199lh uses a strong invariance principle to obtain strong consistency of cer- 
tain spectral variance estimators under conditions similar to those required in Proposi- 
tion E21 Apparently, this can be extended to geometrically ergodic chains via Lemma El 
to obtain a result with regularity conditions similar to Proposition 13.21 However, we 
do not pursue this further here. 
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4 Examples 



In this section we investigate the finite sample performance of RS, BM with 30 batches, 
and CBM with b n = [n 1 ' 3 ] and b n = L^ 1 ^ 2 ] in four examples. In particular, we 
examine the coverage probabilities and half-widths of the resulting intervals as well 
as the required simulation effort. While each example concerns a different statistical 
model and MCMC sampler there are some commonalities. In each case we perform 
many independent replications of the given MCMC sampler. The number of replications 
ranges from 2000 to 9000 depending on the complexity of the example. We used all 
methods on the same output from each replication of the MCMC sampler. When the 
half-width of a 95% interval with p(n) = el(n > n*) (or p(R) = eI(R > R*) for RS) is 
less than e for a particular method, that procedure was stopped and the chain length 
recorded. Our choice of n* is different for each example and was chosen based on our 
empirical experience with the given Markov chain. Other procedures would continue 
until all of them were below the targeted half-width, at which time a single replication 
was complete. In order to estimate the coverage probabilities we need true values of 
the quantities of interest. These are not analytically available in three of our examples. 
Our solution is to obtain precise estimates of the truth through independent methods 
which are different for each example. The details are described below. The results are 
reported in Table 121 



4.1 Toy Example 

Consider estimating the mean of a Pareto(a,/3) distribution, i.e., a/3/(/3 — 1), (3 > 1, 
using a Metropolis-Hastings independence sampler with a Pareto(a, A) candidate. Let 
7r be the target density and v be the proposal density. Assume (3 > A. Then for x > a 

u(x) A A 
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By Theorem 2.1 in iMengersen and Tweedid (|1996l ) this sampler is uniformly ergodic 
and 

\P"(-r.-) k(-)\\ < [ .1.-4 



P, 

In order to ensure the moment conditions required for Proposition 13.21 we set /3 — 10 
and A = 9 in which case the right hand side is 10~ n . Hence this sampler converges 
extremely fast. Implementation of RS was accomplished using (fTTJ) with c = 1.5. 

4.1.1 Comparing convergence diagnostics with CBM 

As noted by a referee, one method f or terminating the simulation is via convergence 



diagnostics. Consider the method of 



Gewekei (jl992l ) which is a diagnostic that seems 



close in spirit to the current work. Geweke's diagnostic (GD) is based on a Markov 
chain CLT and hence does not apply much more generally than CBM; the same can be 
said for many other diagnostics. GD uses a hypothesis test to ascertain when g n has 
stabilized. 

In the remainder of this subsection we compare GD and CBM in terms of mean- 
squared error (MSE) and chain length. To this end we ran 9000 independent replications 
of the independence sampler with a — 1, /3 = 10 and A = 9. We used CBM and GD 
on the output in the following manner. For each replication we set n* = 45 but the R 
package boa required a minimum of 120 iterations in order to calculate GD. After the 
minimum was achieved and the cutoff for a particular method was attained we noted the 
chain length and the current estimate of E n g. The cutoff for CBM was to set the desired 
half-width to e = .005. The result of using GD is a p-value. We chose four values (.05, 
.10, .2 and .4) for the threshold in an attempt to tune the computation. The results 
are reported in Table As we previously noted, this sampler mixes extremely well. 
Thus it is not surprising that using GD results in a small estimated MSE. However, 
using CBM results in much smaller MSE than GD. The average chain lengths make it 
is clear that GD stops the simulation much too soon. Moreover, changing the p-value 
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Method 


Cutoff 


Estimated MSE 


Average Chain Length 


f "J -w tit/1 I 1 /Q I \ 

CBM (b n = [n 1/3 \) 


e = .005 


6.65 x 10~ 6 (9.9 x 10~ 8 ) 


2428 (5) 


/- ^ T tit/1 1 1 /O I \ 

CBM (6 n = ' J) 


e = .005 


7.34 x 10~ 6 (1.2 x 10~ 8 ) 


2615 (3) 


Geweke 


p- value =.4 


1.17 x 10" 4 (2 x 10~ 6 ) 


202.6 (3.4) 


Geweke 


p- value =.2 


1.30 x 10" 4 (2 x 10~ 6 ) 


148.9 (1.6) 


Geweke 


p- value =.1 


1.34 x 10- 4 (2 x 10" 6 ) 


133.4 (.9) 


Geweke 


p- value =.05 


1.37 x 10- 4 (2 x 10" 6 ) 


127.4 (.5) 



Table 1: Summary statistics for CBM versus GD for Example 4.1. Standard errors of 
estimates are in parentheses. 

threshold for GD does not result in substantial improvements in estimation accuracy. 



4.2 A Hierarchical Model 



Efron and Morris! ([19751 ) present a data set that gives the raw batting averages (based 
on 45 official at-bats) and a transformation (a/45 arcsi n (2a; — 1)) for 18 Major League 



Baseball players during the 1970 season. 



Rosenthal! (|l996h considers the following 



conditionally independent hierarchical model for the transformed data. Suppose for 
i — 1 , . . . , K that 



y^-N^i) 

A ~ IG(2,2) 



^|/i,A~N(/i,A) 
/O) oc 1 . 



(16) 



(Note that we say W ~ Gamma(a, 0) if its densit y is proportional t o w a ~ l e~ l3w I{w > 0) 
and if X ~ Gamma(6, c) then X~ l ~ IG(6, c).) iRosenthall (J1996J) introduces a Harris 
ergodic block Gibbs sampler that has the posterior, tt(9, //, X\y), characterized by the 
hierarchy in (|16p as its invariant distribution. This Gibbs sampler completes a one-step 
transition (A', //,#') — > (X,fi,9) by drawing from the distributions of X\9' then fi\9', A 
and subsequently 9\[i, \. The full conditionals needed to implement this sampler are 
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given by 



9i\X,/i, y : ~ N 



A + l A + l, 

Rosenthal proved geometric ergodicity of the associated Markov chain. However, MCMC 
is not required to sample from the posterior; in Appendix[n]we develop an accept-reject 
sampler that produces an iid sample from the posterior. Also in Appendix Owe derive 
an expression for the probability of regeneration IjlOj) . 

We focus on estimating the posterior mean of 6g, the "true" long-run (transformed) 
batting average of the Chicago Cubs' Ron Santo. It is straightforward to check that the 
moment conditions for CBM and RS are met. Finally, we employed our accept-reject 
sampling algorithm to generate 9 x 10 7 independent draws from n(8g\y) which were 
then used to estimate the posterior mean of 8g which we assumed to be the truth. 



4.3 Calculating Exact Conditional P-Values 



Agr esti (2002, p. 432) reports data that correspond to pairs of scorings of tumor ratings 
by two pathologists. A linear by linear association model specifies that the log of the 
Poisson mean in cell i,j satisfies 

log/iij = a + Pi + 7j + 5 ij . 

A parameter free null distribution for testing goodness-of-fit is obtained by condition- 
ing on the sufficient statistics for the parameters, i.e., the margins of the table and 
Tlij n ij^i w here the are the observed cell counts. The resulting conditional dis- 
tribution is a generalization of the hypergeometric distribution. An exact p-value for 
goodness-of-fit versus a saturated alternative can be calculated by summing the con- 
ditional probabilities of all tables satisfying the margins and the additional constraint 
and having deviance statistics larger than the observed. 
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For the current data set there are over twelve billion tables that satisfy the mar- 
gin constraints but an exhaustive search revealed that there are only roughly 34,000 
tables that also satisfy the constraint induced by . ij. We will denote this set of 
permissible tables by T. Now the desired p-value is given by 



J2l[d(y)>d(y ob s)]n(y) 



(17) 



where d(-) is the deviance function and ir denotes the generalized hypergeometric. Since 
we have enumerated T we find that the true exact p-value is .044 whereas the chi- 
squared approximation yields a p-value of .368. However, a different data set with 
different values of the sufficient statistics will have a different reference set which must 
be enumerated in order to find the exact p-value. This would be too computationally 
burdensome to impleme nt generally and hence it is common to resort to MCM C-based 



approximations (see e.g. 



Caffo and Booth . 



2001 



Diaconis and SturmfeL 



1998). 



To estimate (JT7jl we will use the Metropolis-Hastings algorithm developed in 



Caffo and Booth 



(2001). This algorithm is also employed by the R package exactLoglinTest. The as- 
sociated Markov chain is Harris ergodic and its invariant distribution is the appropriate 
generalized hypergeometric distribution. Moreover, the chain is uniformly ergodic and 
since we are estimating the expectation of a bounded function the regularity conditions 
for both RS and CBM are easily met. 

Implementation of RS is straightforward. As we mentioned earlier, in finite state 
spaces regenerations occur whenever the chain returns to any fixed state. In order to 
choose the fixed state we ran the algorithm for 1000 iterations and chose the state which 
had the highest probability with respect to the stationary distribution. The same fixed 



state was used in each replication. 
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4.4 A Model-Based Spatial Statistics Application 

i 1 i — 



Consider the Scottish lip cancer data set ([Clavton and Kaldorl . 119871 ) which consists 



of the number of cases of lip cancer registered in each of the 56 (pre-reorganization) 
counties of Scotland, together with the expected number of cases given the age-sex 
structure of the population. We assume a Poisson likelihood for areal (spatially aggre- 
gated) data. Specifically, for i = 1,....,N we assume that given \i { the disease counts 
Yi are conditionally independent and 

Yi\m ~ Poisson(£;e Mi ) (18) 

where is the known 'expected' number of disease events in the zth region assuming 
constant risk and //j is the log-relative risk of disease for the zth region. Set (p = 
(0i, . . . , 4>n) T ■ Each Hi is modeled as /ij = 9{ + (pi where 

9i\r h ~ N(0,l/r fc ), #r c ~ CAR(r c ) oc r^ 2 exp (-^0 T Q0) , and 

Hi if i = j 
Qij — \ if i is not adjacent to j 
— 1 if z is adjacent to j 
with m the number of neighbors for the ith region. Each 9i captures the zth region's 
extra-Poisson variability due to area-wide heterogeneity, while each (pi captures the ith 
region's excess variability attributable to regional clustering. The priors on the precision 
parameters are ~ Gamma(l, .01) and r c ~ Gamma(l, .02). This is a challenging 
model to consider since the random effects parameters (pi) are not identified in the 
likelihood, and the spatial prior used is improper. Also, no closed form expressions are 
available for the marginal distributions of the parameters, and the posterior distribution 
has 2A^ + 2 dimensions (114 for the lip cancer data). 



Haran and Tiernevl (J2004J) establish uniform ergodicity of a Harris ergodic Metropolis- 
Hastings independence sampler with invariant distribution ir(6,(p,Th,T c \y) where 9 = 
(9i, . . . , 9n) t and a heavy-tailed proposal. In our implementation of RS we used the 
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formula for the probability of a regeneration given by (fTTj) with logc = —342.72. Using 
the empirical supremum of the ratio of the invariant density to the proposal density 
(based on several draws from the proposal) guided the choice of c. 

We focus on estimating the posterior expectation of <ftj, the log-relative risk of disease 
for County 7 attributable to spatial clustering. Finally, we used an independent run of 
length 10 7 to obtain an estimate which we treated as the 'true value'. 

4.5 Summary 

Table 121 reveals that the estimates of the coverage probabilities are all less than the 
desired .95. However, examining the standard errors shows that only BM is signifi- 
cantly less in all of the examples and the estimated coverage probability for RS is not 
significantly different from .95 in 3 out of 4. The story for CBM is more complicated 
in that the coverage depends on the choice of b n . Using b n = [^ 1//3 J gives the best 
coverage for the examples in Sections 14. II and 14. 21 while b n = [n 1 ^ 2 ] is superior for those 
in Sections 14.31 and 14.41 The reason for this is that the Markov chains in Sections 14.11 
and 14. 21 mix exceptionally well and hence smaller batch sizes can be tolerated. However, 
the examples in Sections 14.31 and 14.41 are realistic problems and hence the chains do not 
mix as well so that larger batch sizes are required. Thus we would generally recommend 
using b n = \n 1 / 2 \ . 

The example in subsection 14.31 deserves to be singled out due to the low estimated 
coverage probabilities. The goal in this example was to estimate a fairly small proba- 
bility, a situation in which the Wald interval is known to have poor coverage even in 
iid settings. 

While RS and CBM appear comparable in terms of coverage probability RS tends to 
result in slightly longer runs than CBM which in turn results in longer runs than BM. 
Moreover, RS and CBM are comparable in their ability to produce intervals that meet 
the target half-width more closely than BM. Also, the intervals for RS are apparently 
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more stable than those of CBM and BM. Finally, BM underestimates the Monte Carlo 
standard error and therefore suggests stopping the chain too early. 

While RS has a slight theoretical advantage over CBM their finite sample properties 
appear comparable. Also, like RS, CBM avoids the burn-in issue, which has been a 
long standing obstacle to MCMC practitioners. In addition, CBM enjoys the advantage 
of being slightly easier to implement. Thus CBM clearly has a place in the tool kit of 
MCMC practitioners. 



A Proof of Lemma U 



A.l Preliminary Results 

Recall the split chain X' and that = To < T\ < T2 < ■ ■ ■ denote the regeneration 
times; i.e., r r+ i = min{i > r r : = 1}. 



Lemma 3. (jHobert et al. 



20021 Lemma 1) Let X be a Harris ergodic Markov chain and 



assume that (J0J) holds. Then for any function h : X c 



E 7T \h(X ,X 1 ,...)\>cE Q \h(X ,X 1 ,...) 



where c = E n s. 



Lemma 4. f Hobert et al. 



20021 Lemma 2) Let X be a Harris ergodic Markov chain and 
assume that (JHJ) holds. If X is geometrically ergodic, then there exists a (3 > 1 such 
that E n (3 T1 < oo. 

Corollary 1. Assume the conditions of Lemma 0] For any a > 

oo oo 

£ [Pr.^ > i + l)f < (E^r$>- a(m) < oo . 



i=Q 
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A. 2 Proof of Lemma [T] 



We prove only part 2 of the lemma as part 1 is similar. Without loss of generality we 

assume < 5 < 1. By LemmaEl it is enough to verify that E^rf < oo and EnS p+S < oo. 

Lemma |U shows that E^rf < oo for any p > 0. Note that 
'n-l 



P+<5 



p 

j=i 



P+<5 



i=0 

oo oo oo 



sE -E E 

ii=0 i p =0ip+i=0 

and hence 



/(o<v + i<n-i)l^(x v+1 ; 



oo oo oo 



Ewr<£-E E E - 

U=0 ip=0i p +i=0 



p+1 



n j (°<^< r i-i) 



J=l 



n 



oo oo oo 



<E-EE [E^o^^n-iM^JI 2 ] 

il=0 ip=0j p +i=0 



21 1/2 



X 



• • • x ^/(O < i p < n - l)\g(X lp )\ 2P ] 1/2P [E./(0 < i p+1 <r x - l)\g(X lp+1 )\ 2PS ] 
where the second inequality follows with repeated application of Cauchy-Schwartz. Set 
dj = 1 + 2^/5 and bj — 1 + 5/2^ for j = 1, 2, ... ,p and apply Holder's inequality to 
obtain 

lfy 



2P51 V2 p 



EJ(o < i 5 < n - 1)|<?PQ.)| 2J < [E./(o < ij- < n - l)] 1/a > E^(X, 



|2J+<5 



Note that 



Cj :-- 



1 !/2 p 



< OO . 



Also, if a p+1 = 1 + 2 p and b p+l = 1 + 1/2 P then 

E./(0 < i P+1 < Tl -l)\g(X ip+1 )f s < [EJ(0 < i p+l < Tl - 1)]^ [E^(X p+1 )| 5 ( 2P + 5 )]^ 



Notice that 



1/2P 



< OO 



and set c = max{ci, . . . , c p+ i}. Then an appeal to Corollary ^ yields 

p oo 



E^ 5 < c 



nE^^ +i )} i/M) 

j=li,-=0 



V {Pr(n > + l)} 1 /^^) 

ip+l=0 



< oo . 
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B Proof of Proposition 13.2 



B.l Preliminary Results 



Recall that B = {B(t),t > 0} denotes a standard Brownian motion. Define 



b, 



On — l 



(19) 



3=0 



where B^b n ) = 6" 1 {B{(j + l)b n ) - B{jb n )) and B{n) = n^B^). 



Lemma 5. ((Damerdii . 



1994 p. 508) For all e > and for almost all sample paths there 



exists n (e) such that for all n > n (JDamerdji, 



1994 p. 508) 



< V2(l + e)b- l ' 2 [\og{n/b n ) + loglogn] 1 / 2 



(20) 



Lemma 6. (jCsorgo and Revesa . 119811 1 For all e > and for almost all sample paths 
there exists n (e) such that for all n > no 



\B{n)\ < (1 + e)[2nloglogn] 1/2 . 



(21) 



B.2 Proof of Proposition 1X21 



Proposition 13.21 follows from Lemma 121 and the following two lemmas: 



Lemma 7. ([Damerdii 



1994 Proposition 3.1) Assume 



1. b n — > oo and n/b n — > oo as n — > oo and 

2. there exists a constant c > 1 such that Y^ n (bn/ n ) c < °o 
then as n — > oo, ex 2 — > 1 a.s. 

Lemma 8. Assume that (JT5j) holds with 7(n) = n" logn where a = 1/(2 + 5). If 

1. a n — > oo as n — > oo, 

2. 6 n — > oo and n/6 n — > oo as n — > oo and 

3. b^n 2 " [log n] 3 — > as n — > oo where a = 1/(2 + 5) 
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then as n — ■> oo, a 2 BM — o 2 o 2 — > a.s. 



Proof of Lemma |H1 Recall that X = {Xx,X 2 , . . .} is a Harris ergodic Markov chain. 
Define the process Y" by Yi — <7pQ) — E v g for i = 1,2,3,.... Then 

7 a n — 1 



On - 



T E (r# n )-y(n))' 



3=0 



where = Y%Li Y jb n +i for j = 0, . . . , a n - 1 and F(n) = n 1 ^"=i ^ Since 



y)(6 n ) - Y(n) = ^-(M - Y(n) ± ± a 9J B(n) 



we have 



a„ — 1 



n J- . n 

3=0 



+ |2(Yj-(& n ) - (7,^(6 n ))(F(n) - a g B(n))\ + |2a 9 (F J (6 n ) - a :l I!,(b h )) I!,ib, 
+ \2a g (Yj(b n ) - ( r g S i (6 ri ))S(n)| + |2<r 5 (Y(n) - <r 9 B(n))5 3 (6 n )| 
+ |2a 9 (F(n) -a g B{n))B{n)\] . 
Now we will consider each term in the sum and show that it tends to 0. 



1. Our assumptions say that there exists a constant C such that for all large n 



g(Xj) - nE T g - a g B(n) 



i=l 



< Cn a \ogn a.s. 



(22) 



Note that 



Yjip n ) ~ o- g Bj(b n ) = — 

On. 



E *i-^a(o-+i)&„) 

i=l 



i=l 



and hence by 



{j+l)b n jb n 
E ^ - ^(tf + !) & n)l + I E ^ " a 9 B UK 



1=1 



t=l 



< —Cn a \ogn 



(23) 



26 



Then 



, a„ — 1 

a 



-L . „ tin J- 



n) 2 ^0 



j=0 



as n — > oo by conditions 1 and 3. 
2. Apply (J22D to obtain 



\Y(n) - a g B(n)\ = n^^Y, - a g B(n)\ < Cn " 1 logn. (24) 



i=l 

Then 

a n — 1 



^ J £(?{n)-v g B{n))*<e 



2 



a„ — 1 ^-^ v '' a„ — 1 n n 1 2a 

n j=o n 

as n — > oo by conditions 1 and 2 and since 1 — 2a > 0. 
3. By (J21 and (|2ljl 

|2(YK& n ) - cx^- (&„))(>» " 0fl£(n))| < 4C 2 &- 1 n 2Q - 1 (logn) 2 . 

Thus 



\m(b n ) - a g Bj{b n ))(Y{n) - a g B{n))\ < AC 2 

0"a J- • n 



a n - 1 n 1 - 2 " 

3=0 

as Ti — > oo by condition 1 and since 1 — 2a > 0. 
4. Since 6 n > 2, (JH|) and flU together imply 

|(^(6„) - a s S,(6 n ))S,(fo n )| < 2 3 / 2 C(l + e)^ 1 [ ;V«(logn) 2 log(n/ 0n ) 

H-o^n 2 " (log n) 2 log log n] ^ 2 

Hence 

; a n — 1 



On 

j=0 



- J2 \^s(Yj(M -^i(6n))4-(6n)l < 8a 3 C(l + e)^- fcV^log n) 2 bg(n/6 r , 

+ fe-V a (logn) 2 loglogn] 1/2 ^0 



as n — > oo by conditions 1 and 3. 
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5. By (J23J) and (j2U) \{Yj{b n )-a g Bj{b n ))B{n)\ < ^(l+e^n^+^lognXloglogn) 1 / 2 
so that 

bn n /v^j, ^ n ft, war M/fi | \ «n (log n) (log log n) 1/2 

t£rj X Cir) J- ft 

j=0 

as n — > oo by condition 1 and since 1/2 — a > 0. 

6. Use (J201) and (J2H) to get 

\(Y(n) - a.BWBjiK)} < V2C(1 + e ) V_^ [log(n/6 n ) + loglogn] 1/2 



and hence using conditions 1, 2 and 3 shows that as n —>■ oo 



h 



a 



n 



I J2 \ 2a 9(Y(n) - a g B{n))Bj{b n )\ < 



3=0 



Aa g C(l + e)^-^ [6- 1 n Jta ((logn) a log(n/6 B ) + (log n) 2 log log n)] 1/2 - 
a„ — 1 n L J 



7. Now d2H> and (J2U) imply |(y(n) - cr ff .B(n)).B(n)| < 2C(1 + e)n- 3 / 2+Q (logn) 3 / 2 . 
Hence 

^ £ |2a fl (F(n) - a 9 B(n))B(n)\ < ^ 9 C{1 + e )^ T ^ il ^ ! - 

n j=0 n 
as n — >• oo by conditions 1 and 2 and since 1/2 — a > 0. 



C Calculations for Example 14.2 



We consider a slightly more general formulation of the model given in (JTHJ). Suppose 
for i — 1, . . . , K that 

Yi\9i~N(9i,a) 9i\n,\~N(ji,\) (25) 
A~IG(6,c) /(yu)ocl. 

where a, b, c are all known positive constants. 
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C.l Sampling from 7r(0, (i, A \y) 

Let 7r(8, p, X\y) be the posterior distribution corresponding to the hierarchy in (J2*5|) . 
Note that # is a vector containing all of the 9i and that y is a vector containing all of 
the data. Consider the factorization 

tt(0, /i, A|i/) = tt(%, A, y)ir{p\X, 2/)7r(A|y). (26) 

If it is possible to sequentially simulate from each of the densities on the right-hand side 
of (|2l)|) we can produce iid draws from the posterior. Now n(9\fi, A, y) is the product of 
independent univariate normal densities, i.e. 9i\p,X,y ~ N((Xyi + ap)/(X + a), aA/(A + 
a)). Also, 7r(/x|A, y) is a normal distribution, i.e. /i|A,y ~ N(y, (A + a)/K). Next 

1 l2/j A^(A + a) (x - 1)/2 

where f/ = K~ l Y^f = iVi and s 2 = J^iLiG/* ~ ^) 2 - ^n accept-reject sampler with an 
IG(6, c) candidate can be used to sample from n(X\y) since if we let g(X) be the kernel 
of an IG(6, c) density 

sup — — i ^-^ e -c/^ 2 l^+a) = (A + a) (l-^)/2 e - S V2(A+a) = M < OO 

x >o g(X)X b + 1 (X + a)( K - i y 2 A >o V 

It is easy to show that the only critical point is A = s 2 /(K — 1) — a which is where the 
maximum occurs if A > 0. But if A < then the maximum occurs at 0. 



C.2 Implementing regenerative simulation 



We begin by establishing the minorization condition Q for 



Rosenthal 's J1996J) block 



Gibbs sampler. For the one-step transition (A', pf, 8') — > (A, p, 9) the Markov transition 



density, p, is given by p(A, //, 0| A', p', 9') = /(A, /i|#')/(#|A, p). Note that X 



x JK X x 
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i K . Fix a point (A, fl, 0) 6 X and let D C X. Then 
p(A,^|AV^0=/(A,^')/(#|A,/i) 

^/(A^I^O/^IA,//)/^)^} 



(A, M ,e)eD f{\fji\e) 
and hence (jHJ) will follow by setting 



[(A, M ,e)GD f(\,u\6) J 



f(\,fi\9)f(6\\,fi) dX d\i d9, 

D 

s(X',^,9')=e inf ^TT^T and ?(A, = ^/(A, /i|^)/(0|A, //)J {(W)6l?} . 

(A,/x,6»)eD /(A,/z|6>) 

Now using (jiup shows that when (A,/i, #) G -D the probability of regeneration is given 
by 

Pr(*=l|AV,*,A,**) = { inf ^CU^^f (27) 

Thus we need to calculate the infimum and plug into (|2Tj) . To this end let < di < 

d 2 < oo, — oo < c?3 < c?4 < oo and set D = [di,d 2 ] x [ofa,^] x R-^. Define V(#, /i) = 
5^i=i(^» ~ I 1 ) 2 an d note that 

inf /(a,^;) = inf ^(mri-^ril fv ( g-,«-/(y,A) 

(W)ei> f(\,ii\B) Ae[di,*i], M6[d 3 ,d 4 ] 2A j l 2 A 

where /} = d 4 I(0' < I) + d 3 I(0' > I) and A = d 2 I(V(6 , 1 jl) < V(9,fi)) + di/(V(0',/i) > 
V(0,/t)). We find the fixed point with a preliminary estimate of the mean of the 
stationary distribution, and D to be centered at that point. Let (A, /i, 6) be the ergodic 
mean for a preliminary Gibbs sampler run, and let S\ and denote the usual sample 
standard deviations of A and fi respectively. After some trial and error we took di = 
max | .01, A — .5S\ j, d 2 = A + .5S\, d 3 = £l — and <i 4 = ji + S^. 
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Table 2: Summary statistics for BM, CBM and RS. Standard errors of estimates are 



in parentheses. 
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